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Abstract —In this paper we derive an online estimator for 
sparse parameter vectors which, unlike the LASSO approach, 
does not require the tuning of any hyperparameters. The al¬ 
gorithm is based on a covariance matching approach and is 
equivalent to a weighted version of the square-root LASSO. 
The computational complexity of the estimator is of the same 
order as that of the online versions of regularized least-squares 
(RLS) and LASSO. We provide a numerical comparison with 
feasible and infeasible implementations of the LASSO and RLS to 
illustrate the advantage of the proposed online hyperparameter- 
free estimator. 

I. Introduction 

Estimating a high-dimensional sparse vector of parameters 
with a few dominant or nonzero elements has become an 
important topic in statistics and signal processing. Applications 
of sparse estimation include spectral analysis ID-EI, array 
processing 0-0, biomedical analysis m-m, magnetic 
resonance imaging CD, CD, system identification CD-CD 
and synthetic aperture radar imaging 111 81 . fl9l . 

Many sparse estimation approaches can be implemented 
using various computational methods and it is relevant to 
formulate estimators that scale well with the size of the data. 
Furthermore, in several applications data is obtained as a 
stream of measurements and it is desirable to process them 
accordingly. Both reasons motivate developing estimation 
methods that perform ‘online’ processing, that is, successively 
refining the estimate of the sparse parameter vector for each 
obtained data sample. Another common issue with sparse 
estimation methods is the need for the user to select or 
tune critical hyperparameters to strike a balance between data 
fidelity and sparsity so as to fit a particular measurement 
setup EoJ, ED. This selection is, however, rarely feasible 
in online scenarios. Furthermore, when the user has to tune 
hyperparameters the outcomes become more arbitrary and 
the reproducibility of the method is reduced. Finally, many 
convex relaxation-based sparse estimation methods are not 
well adapted for complex-valued data and parameters and thus 
they must separate the data into real and imaginary parts. 
This separating approach requires enforcing pairwise sparsity 
constraints to avoid performance loss and effectively doubles 
the size of computed quantities fl22ll . | |23| . 

In this paper we develop a sparse estimation method that 
addresses the aforementioned issues. Specifically, 

• the estimator is implemented online with the same com¬ 
plexity order as the best existing online methods for 
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sparse estimation. 

• It automatically adapts to the signal model via a covari¬ 
ance matching approach and in this way obviates the need 
for tuning hyperparameters. 

• The method can estimate complex-valued parameters as 
simply as real-valued ones. 

Notation: || • || i, || • ||2 and || • ||jr denote the l\, £2 and 
Frobenius norms, respectively. Unless otherwise stated, || • || 
will denote the £ 2 -norm and ||x|| w = Vx' Wx where W >- 0 
is a positive definite matrix. [T]j is the ith column of matrix 
r and T ' is the Moore-Penrose pseudoinverse. 

Abbreviations: Least squares (LS), regularized least-squares 
(RLS), least absolute shrinkage and selector operator (Lasso), 
sparse iterative covariance-based estimation (SPICE), mean- 
square error (MSE), online (Ol). 

II. Background 

Consider a sequence of scalar measurements: 

y t = h*e + w t €C, t = 1,2,, (1) 

where the regressor vector h, £ C p is given, the unknown 
sparse parameter vector is 9 £ C p and w t is zero-mean noise 
with variance a 2 . For the sake of generality we consider 
complex-valued variables; any differences that occur in the 
real-valued case will be addressed below. 

Suppose we have obtained n measurements. Then in vector 
form we can write 

Yn = H„0 + w n £ C", (2) 

where 


To avoid notational clutter we will omit the superindex for the 
columns and simply write c i. In the following sub-sections we 
review a few estimators Q n of 6 in (0, based on regularizations 
of the least-squares approach, and their online formulations 
that compute 0 n from 9 n -i thus eliminating the need for re¬ 
calculating the estimate from scratch. 

A. LS and RLS 

The LS approach is based on solving the quadratic problem 

m-m 

argmin ||y n — H n 0||2, (3) 

sec? 
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which has the following minimum 1 2 -norm solution G n = 
Hjj y n . If H„ has full column-rank, then the estimator admits a 
simple closed-form solution that can be computed by recursive 
updates for n = 1,2,.... This obviates the need for choosing 
an initial estimate G 0 or any hyperparameter (see, e.g. l27ll ). 

It is more common to consider a regularized LS problem 

argmin ||y„ - H n 0||2 + ||0 ||a, (4) 

with an initial estimate Gq = 0 that is well-motivated for 
sparse parameter vectors and with A = AI p y 0, where A is a 
hyperparameter chosen by the user to bias the estimate towards 
0 with the aim of reducing its variance. This estimator admits 
an online form G n = 1 + K n (y n - h*0„_i), where K„ 

is a matrix determined from the regressors and A (26) . The 
computational complexity of this .^-regularized least-squares 
algorithm is of the order 0(p 2 ) per sample. 

One approach that takes sparsity into account would be to 
perform online Rls estimation only on the nonzero compo¬ 
nents of G , if these were known. In |28) the components are 
successively detected in a greedy manner using information 
theoretic criteria at each sample y n . Since the detection 
process is subject to errors, the resulting online sparse least- 
squares estimate is only an approximation of © applied to 
the subvector of nonzero coefficients. 

B. Lasso 

A substantially different approach than Rls consists of re¬ 
placing the ^ 2 -norm regularization term in (0} with alternative 
forms that promote sparsity directly in the objective itself 129) . 
In doing so, sparse solutions can be obtained without the need 
for concomitantly detecting the nonzero components of 6. This 
approach to sparse parameter estimation was popularized in 
HE ESI- The LASSO estimator solves the following convex 
problem 

argmin ||y„ - H„0||| + A„||0||i. (5) 

0£R!> 

While the solution G n does not have a closed-form expression, 
it can be computed using various numerical methods. Among 
the more computationally elegant and scalable methods is 
the cyclic minimization strategy of coordinate descent which 
updates one element of 9 n at a time in an iterative manner, 
cf. EE E2 and references therein. 

One way of formulating an online solution is to interpret 
(|3 as a penalized maximum likelihood estimator, assuming 
Gaussian noise in ©. Then it is possible to formulate an iter¬ 
ative expectation maximization algorithm with recursively up¬ 
dated quantities using auxiliarly variables 1331. The drawback, 
however, is that an additional hyperparameter, besides A„ in 
©, needs to be tuned. Another way of dynamically updating 
the estimate 0 n -i from (),, is the method of homotopy (34), 
031 . whereby the cost function in © with a fixed A„ = A 
is modified into ||y n _i - H Tl _ 1 0||| + e\y n - h;0| 2 + AH^Hi.. 
As the scalar parameter e £ [0,1] is varied from 0 to 1, the 
transition from G n _ 1 to G n can be computed more efficiently 
than recalculating G n from scratch thereby enabling an online 
formulation. 


For the real-valued case, an elegant online formulation is 
found in P6l . which is based on the cyclic minimization 
strategy mentioned above. The cost function in © can be 
written equivalently as G T T n G — 20 T p” + A n ||0||i, ignoring 
any constant, where F n = H*H„ and p n = H*y„ can be 
computed recursively. Then, starting from an initial estimate 
0(i, the elements of 0 „ are updated for each sample by solving 

0i = argmin T™9 2 - 2 p™0i + A„|0j| 

s, 

in closed form for i = 1,72, where p" = p" — r ^Gj 

and 9j denotes the current estimate. The complexity of the full 
online cyclic minimization LASSO is 0(p 2 ) per sample. 

Under certain conditions on the regressors, sparsity of G, 
and noise, it is possible to prove that the LASSO estimator 
possesses ‘oracle’ properties. That is, asymptotically it can 
identify the support set of G and perform as well as Rls 
applied to the nonzero coefficients of the parameter vector, 
cf. ll36l - [[39) . This, however, requires selecting the hyperpa¬ 
rameter A„ based on the knowledge of the noise variance <r 2 
which is rarely feasible in practical (online) scenarios. 

C. Square-root LASSO 

To circumvent the need to know 0 in the LASSO, a subtle 
modification of © was proposed in (40) . 

argmin ||y„ - H„0|| 2 + A„||0||i, (6) 

eeR p 

where the first term, containing the residuals, is the square- 
root of that in ©. As argued in (40) . near-oracle performance 
for both © and © can be achieved when \ n is chosen 
as the smallest value that dominates the gradient of the 
first term, when evaluated at the true G. At this point, the 
gradient captures the estimation errors arising from noise 
alone. However, by re-parameterizing © as y„ = H n G + oe, 
where E[ee \ = I„, it is seen that the gradients of the first 
terms in © and © differ in one crucial respect; namely the 
latter does not depend on <7 thus rendering the choice of A„ 
for © invariant to the noise level. 

Another way to address the dependence on a is to estimate 
it ED. The square-root Lasso estimator in © can in fact be 
interpreted as an £ 1 -penalized joint estimator of G and er used 
in robust regression. Suppose 7 (-) is a convex loss function of 
the normalized residuals (y t — h, 1 G) / 0 . Then the concomitant 
M-estimator of location and scale, G and o, is given by li42l 
ch.7] 

1 

argmin — 
e,a n 

where a > 0 is a user-defined parameter. In robust regression, 
various loss functions are considered to mitigate noise outliers. 
For a squared-error loss 'y(x) = x 2 , we obtain the minimizer 
^ = ||y n - H„0|| 2 /\fna in closed form. Penalizing the 
M-estimator in © by A n ||0||i and concentrating out the 
minimizing o with a = n /4 yields ©. 

While an efficient choice of X n in © is independent of < 7 , 
the user input is still required; furthermore, the choices of \ n 
in ED are predicated on the assumption that each column of 


. yt - h G , , 

7 I - I + a 


(7) 





3 


H„ has unit norm. Such a rescaling of the regressors may 
not be practical in an online scenario. Note that a cyclic 
minimization algorithm for the convex square-root Lasso has 
been presented in the supplementary material of fifil (albeit 
only for the real-valued case and without any derivation) but 
an online implementation has not yet been formulated. 


D. SPICE as weighted square-root LASSO 

Let us now consider the estimation problem from a statisti¬ 
cal point of view. Suppose 0 is a zero-mean random variable 
with covariance matrix P >- 0. Then the linear estimator that 
minimizes the mean square error E yi g[||0 — ||§] is obtained 

by solving 

argmin — ||y„ - H„0||| + ||0||p_i, (8) 

e & 

and can be written in closed form as l25l , l26l . l43l 

9 n = PH* (H„PH* + cr 2 I„) -1 y„ 

= (HJjH n + <7 2 P -1 ) -1 H* yil . 

In the problem under consideration, however, neither P nor cr 2 
is known. By treating them as unknown parameters, they can 
be estimated by a covariance-matching approach (e.g., El, 
B31 ) and then used in (0. 

For reasons of parsimony and tractability we do not model 
any correlations between the elements of 0 and hence P is 
a p x p diagonal matrix. Now consider the covariance matrix 
of the data R„ = E[y n y*J = H„PH* + cr 2 I„, which is a 
function of P and cr 2 . We choose these p + 1 nonnegative 
parameters to match the covariance of the observed data, by 
minimizing the criterion 

l|R^ 1/2 (yny; - Rn)||F, 

with respect to P and cr 2 . This criterion is the basis of the 
sparse iterative covariance-based estimation (SPICE) frame¬ 
work. 

Using this covariance-matching approach is equivalent to 
solving for the parameters jointly in the following augmented 
problem 

argmin \\\y n - H n 0|| 2 + ||0||p-i 

8.P ,<r> cr 2 (10) 

+ tr {H„PH* + cr 2 I„} , 

which is similar in form to © but contains the additional term 
tr{R„} = tr {H„PH; + cr 2 I„}. (See Appendix A for a proof 
of this equivalence.) Furthermore, following 11461 - 1481 it can 
be shown that solving for P and cr 2 , and concentrating them 
out from ([Tol l, results in 


argmin ||y„ - H„0|| 2 + ||D„0||i, 

0eO 


( 11 ) 


where 


D n = diag 



II2 


Eq. ( 1111 can be interpreted as a weighted, hyperparameter- 
free square-root LASSO. As is the case with the square-root 
LASSO, online formulations of ( fill have not appeared in the 
literature. 


E. Problem formulation 

We have reviewed several approaches to sparse parameter 
estimation as well as some of their interconnections and limita¬ 
tions. Note that all of the estimators considered above involve 
convex minimization problems. The and l\ -penalized forms 
of 0 in © and © have concise online formulations but 
require the careful selection of hyperparameters. Furthermore, 
an efficient choice depends on the unknown noise power cr 2 . 
The hyperparameters choice is rendered invariant to <r 2 by the 
change in ©. Moreover, this selection is entirely avoided in 
(fill) using the SPICE approach. 

The goal of the remainder of the paper is to formulate 
an online SPICE estimator for the sparse vector 0 (see (ini ') 
given data {j/i,h t }" =1 . This estimator, denoted ‘Ol-Spice’, 
obviates the need for user-defined hyperparameters, treats the 
complex-valued case as simply as the real-valued one, and is 
of the same complexity order as the online solutions of © 
and ©. In the numerical example section we provide results 
comparing the aforementioned online estimators, viz. Ol-Rls, 
Ol-Lasso and Ol-Spice. 

III. Online Spice 

First we formulate a low-complexity cyclic minimization 
algorithm for the cost function in (HtI . Then, using this result 
we derive an online estimator which sequentially processes a 
stream of data with complexity 0(p 2 ) per sample. 

A. Cyclic minimization 

Let the cost function in (fill be denoted as J(0) = ||y„ — 
H. rl 011 2 -F11Dn 011 1 ■ We begin by minimizing J(0) with respect 
to one component 9i at a time. Let y* = y — Yk^i c k@k 
(omitting the index n to lighten the notation); then the cost 
function can be re-written as 

J(0i) = (||y< - a0i\\l) 1/2 + dulOi I + K, (12) 

where du is the zth diagonal element of D„ and K = 
Yk^i dkk\6k\ is a constant. To tackle this scalar minimization 
problem we reparameterize the ith variable in polar form 0, = 
where r-j > 0 and ipi £ [— tv, tt) (or <pi £ {0, 7 r} when 
Oi is real-valued). This enables the following reformulation of 
the quadratic term in (IT2l ): 

l|y* - chilli = ||y* - Cine^Wl 

= ||y*||i + IWine^Wl - 2Re{r i c*y i e _Wi } 

= l|y*lli + INlik 2 

- 2r»|c*yi| cos(arg(c*y») - w). 

(13) 

Inserting (IT3l) into J(0,;) and noting that \9i\ = \rie J<Pi \ = rt, 
we obtain the following criterion as a function of r\ and tpi, 

= (llyilli + IMli ?", 2 ^ 2rj|c*yi| cos(arg(c*yj) - pf)) 1 ' 2 
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The minimizing pi is simply 

<Pi = arg(c*y;), (15) 

whether the data is complex or real-valued. 

Next, let 

«< = lly*ll 2 

Pi 4 || Cl || 2 (16) 

a - Ky*|, 

so that we can write (fl4l> as 

= (a, + ftr 2 - 2'y i r i ) 1/2 + dun. 

Note that by the Cauchy-Schwarz inequality 

OLiPi ~ if > 0. 

Equality in (fl 8 l > occurs only when y, is colinear with c ? . 
Inserting (0 into yone obtains y, = c fc$fc + c + w, 

where 8k denote estimation errors when holding the remaining 
coefficients constant. Due to the random noise w, y» and c, 
will not be colinear, making the inequality (fl 8 l > strict, with 
probability 1 . 

We now show that ( fTTb is convex and derive the minimizing 
r > 0 of this function (dropping the index i, in what follows, 
for notational simplicity). The first-order derivative is 

dJ Pr — 7 

dr- (/3r 2 — 2jr + a) 1 ^ 2 


(17) 


(18) 


d, 


(19) 


where the quadratic expression in the denominator can be 
factored as 


pr 2 — 27 r + a = P 


a 


7 


( 20 ) 


Given the strict inequality in dT 8 l ) it follows that the right-hand 
side of < [ 20 b . and therefore the denominator of (fl9l >. is strictly 
positive. The second-order derivative can be expressed as 

d 2 J _ _ p _ (pr - 7 ) 2 

dr 2 (p r ‘ 2 — 27 r + a ) 1//2 (Pr 2 — 27 r + ct ) 3//2 

(P ( Pr 2 - 27 r + a) - (Pr - 7 ) 2 ) 


(Pr 2 — 27 r + a ) 3 ^ 2 
1 


(ap - 7 2 ) . 


(Pr 2 — 27 r + a ) 3 ^ 2 

Note that the above equation is positive, in view of ( ITSl ). Thus 
the function (fTTb is convex and the minimizer r > 0 is given 
by its stationary point; or else r = 0 . 

The stationary point, for which dJ/dr = 0, can be found 
by solving (see (fl~9b): 

(pr — 7 ) = —d(Pr 2 — 27 r + a ) 1 ^ 2 , 

which leads to the condition 0 < r < ^ given that both 
sides must be negative. By squaring both sides of the above 
expression we can write 


P 2 


7 


= d 2 p 


7 


using ( f 20 b . or 

7 \ 2 d 2 (ap — Y 

r ~ p) = j^p\r^~ 

where d 2 /(P~d 2 ) = (||c|| 2 /n)/(||c|| 2 -||c|| 2 /n) = l/(n-l). 
Therefore for n > 1, we can write the solution more compactly 
as (reinstating the dependence on i): 

f - 7« 1 ( aiPi-lf \ 1/2 (21) 


n — 1 

given the fact that 0 < n < ^ i . 

Finally, we summarize the element-wise minimizer of (IT2b 

as . _ _ 

n (new*, if sjn - l 7 i > y/atiPi - if 

= \o, else (22) 

using (fTTb and (ITTb . Updating each element 6 t while holding 
the remaining elements constant will monotonically reduce the 
convex cost function (1 1 2b . Thus repeating (f22l) for i = 1,... ,p 
results in a computationally simple cyclic minimizer. 

B. Online formulation 

We now derive a method for efficiently computing (fTTb . 
given the current estimate which we denote by 6. at any n, 
for notational simplicity. At n = 0, the estimate is initialized 
as 9 = 0. We note that the variables in ( fTTb and (fTTTb depend on 
quantities whose dimensions grow with n; namely, y, and c,. 
By introducing recursively computed variables we derive an 
estimate update that keeps the complexity and memory storage 
constant at each sample and is of the same complexity order 
as online Rls and LASSO. 

First, we introduce the auxiliary variable z n = y„ H„ 0, 
which will subsequently be eliminated as we proceed in the 
derivation. Then we can write the following identity y?; = 
z„ + cfhj, which enables the variables in ( 1 1 6[ i to be expressed 
as 

«* = llyill 2 

+ c i^i || 2 

l | 2 + ll c z|| 2 |0i | 2 + 2Re{0*c*z n } 


(23) 


Pi = 

li = |c-yi| 

= |c*(z„ + cA)\. 

Next, introduce the auxiliary variables 
In ± IKII 2 
Cn = h; z „ 

and the recursively computed variables 

r n = h;h„ = r”- 1 + h„h; 

P n = H*y n = p™ -1 + h n y n 
«' n = y* n yn = K n - 1 + \yn \ 2 , 

that are initialized as 0. Then d23l > can be simplified as follows: 

a i = Vn + ri\e i \ 2 + 2Rc{e*Q} 

Pi = rs 

n = \(i + r?M 


(24) 


(25) 


(26) 
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where Q denotes the /th element of C, n . Similarly, ( IT5 ] i can be 
expressed as 

ft = aig(Ci+1 1A). (27) 

Therefore the computation of (l22l > can be expressed in terms 
of (l24l >. ( |25] > and the current estimate (1,. 

Once Oi has been computed, the current estimate must be 
updated along with the auxiliary variables to compute the 
subsequent coefficients of 0. The variable z n can easily be 
updated as z' n = z n + ci{9i — 9i), and it follows that the 
update of (l24l > equals 

V'n = IK II 2 

= Vn + Tl\e, - 0,| 2 + 2Re{((?i - OiYQ 

C = 

= c„ + [r n ]i((?i-(9i), 


Remark 2. The original SPICE batch algorithms (3J, 0, with 
uniform noise variance, and the above online formulation solve 
the same convex problem iteratively. The former uses an initial 
batch estimate whereas the latter is initialized by setting 9 = 
0. A more important difference, however, is that the former 
requires repeated inversions of n x n matrices, each of which 
is of complexity 0(n 3 ), whereas the latter requires none. This 
renders batch SPICE intractable when n takes on large values 
(such as n > 1000 for a regular PC) and precludes its use in 
scenarios considered in this work. 

Remark 3. We note that the approach employed to derive Ol- 
Spice also enables an alternative formulation of Ol-Lasso 
that treats the complex-valued case as simply as the real-valued 
one. See Appendix B for a derivation. 

IV. Numerical evaluation 


which involves a small number of scalar operations and an 
addition of two p x 1 vectors. The variable z n can now be 
eliminated, initializing the auxiliary variables <® for each 
sample n as y n = n n + 6 T n G — 2Re{0 p”} and = 
p" — T n 0. We summarize Ol-Spice in Algorithm D] The 
algorithm specifies the update of the estimate for each new 
sample y n and is initialized at n = 0 by 0 = 0. The cyclic 
computation of i = 1..... p is terminated after L > 1 
repetitions per sample, cf. line 14 in Algorithm Q] 

In sum, by introducing the auxiliary variables we can 
maintain constant storage and a computational complexity 
of order 0(Lp 2 ) per sample. Since L > 1 is a constant 
independent of p, this is the same complexity order as online 
Rls and LASSO. As reported below, L = 1 performs well in 
practice. Other update strategies, akin to those considered in 
ll36l . can be explored in online applications where complexity 
needs to be further reduced. 


Algorithm 1 : Online Spice 

1: Input: y n , h„ and 6 

2 : r := r + h n h* 

3: p := p + h n y n 
4: K := K+ \y n \ 2 
5 : rj= k + G*TG ~2Re{G*p} 

6: C = P - 

7: repeat 

8: i = 

9: Compute d26l) and (f27l) 

10 : Compute 9i using (l22l ) 

11 : 7? := y + r«|& -§i I 2 + 2 Re{(& - 0 4 )*Ci} 

12: C “ C'+ [T]i(0i - 0i) 

13: 9i ■- Gi 

14: until number of iterations equals L 
15: Output: G 


Remark 1. At any n, the output of the algorithm converges 
to the global minimizer ( fill as L —> oo which follows from 
the above analysis of the convex minimization problem. A 
convergence analysis for finite L and n —> oo is, however, 
nontrivial, cf. (25] ch. 9]. 


In this section we compare the derived Ol-Spice with 
feasible and infeasible versions of the Ol-Rls and Ol-Lasso 


The infeasible Ol-Rls is implemented by processing only 
the (unknown) subset of nonzero coefficients in G , whereas 
the feasible Ol-Rls processes the entire vector, with the 
regularization parameter set arbitrarily to A = 1. The infeasible 
Ol-Lasso is implemented by setting X n = y/2cr 2 n log p, 
which is proportional to the (unknown) noise level |36l , 
whereas for the feasible Ol-Lasso we set X n = \/n log p. 

The performance of the estimators was evaluated using the 
normalized mean-square error 


NMSE = 


E„, fl [||g-e||i] 

e*[||0|H] 


(28) 


When G is an unknown deterministic variable, the expectation 
with respect to it in d28l > should be be removed. Note that 
an NMSE value below 0 dB quantifies the error reduction 
from the initial guess 6 = 0. The NMSE was evaluated using 
100 Monte Carlo simulations. We used a PC with Intel i7 
3.4 GHz CPU and 16 GB RAM. The algorithms were imple¬ 
mented in Matlab without any special code optimization or 
hardware acceleration. 

Remark: In the interest of reproducible research 
we have made the codes for Ol-Spice, as well as 
for the presented numerical experiments, available at 

https://www.it.uu.se/katalog/davza513. 


A. Real-valued example: random regressors 

To illustrate the performance of the estimators we consider 
a scenario with the real-valued regressor elements h t in £[]) 
drawn from identical and independent Gaussian distributions 
(i.i.d.) with zero mean and unit variance. The signal to noise 
ratio is defined as 

SN R= min ^E, 1 |9.| i l, 

a 1 

where S is the support set of 9 £ M. p and p = 500. We use 
Gaussian noise throughout all experiments. 

We first consider G to be a deterministic parameter. In the 
first experiment we set p* = 3 nonzero elements, Om = 1, 
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SNR [dB] 


Fig. 3. IID regressors and deterministic 0. NMSE versus SNR. n = 250 
samples and p * = 3. The asterisk denotes the infeasible Ol-Lasso. 



Fig. 4. IID regressors and deterministic 0. NMSE versus n. SNR=20 dB 
and p* = 3. The asterisk denotes the infeasible Ol-Rls. 


(? 2 o = 1 and 0i4o = 3. Note that since the regressors are 
drawn independently, the chosen support set for 6 will not 
affect the performance. When the number of samples is very 
small, the estimates for Ol-Lasso and OL-SPICE have a 
higher variance than Ol-Rls which biases its estimate more 
strongly towards 0. For clarity we therefore set 6 n = 0 during 
0 < n < 20 for all estimators. The results are shown in 
Fig. H] for SNR=20 dB. We observe a significant performance 
gap between the feasible and infeasible Ol-Lasso, which 
illustrates how critical it is to tune the hyperparameter X n to 
the unknown noise variance a 2 . Both Ol-Lasso and Ol- 
Spice quickly prune out many nonzero coefficient estimates, 
the effect of which is visible in the transition phase of the 
NMSE plot. The performance of Ol-Rls becomes better than 
that of Ol-Lasso when n > 750. Ol-Spice outperforms the 
feasible Ol-Lasso after about n = 100 samples and is closer 
to the infeasible version which uses an optimally tuned A n . 

Fig. [2] presents the variance and bias of the estimators by 
decomposing the mean square error in (f28l) . Both versions 
of Ol-Lasso exhibit much lower variance than square-bias 
whereas Ol-Spice has a more balanced variance-bias com¬ 
position and noticeably the lowest bias among the considered 
estimators. 

In Fig. [T| we see that the NMSE for feasible Ol-Rls 
and Ol-Lasso, which is dominated by the bias, remains 
virtually unaffected by increasing SNR for a fixed number of 
samples n = 250. By contrast, the errors for Ol-Spice and 
the infeasible Ol-Lasso decrease as the signal conditions 
improve. 

Next, we study the effect of the number of iteration cycles L 
per sample in Ol-Spice. The results are illustrated in Fig. [4] 
We note that the performance characteristics for L = 1, 10 
and 100, are very similar. For n < 120, a larger L leads 
to slightly faster decrease of the NMSE but the differences 
rapidly diminish as L increases and the NMSE curves almost 
coincide for n > 200. For reference we included the infeasible 
Ol-Rls, which provides a lower bound on the NMSE in Fig. [4] 


and can be compared to Fig. Q] 

In the next experimental setup we set the number of nonzero 
elements to p* = 50, 0i = • • • = 050 = 1, thus increasing 
element density of 0 to 10%. The results in Fig. [5] show that 
Ol-Spice can better cope with less sparse parameter vectors 
than Ol-Lasso. For n > 300 it exhibits lower NMSE than 
the rest, owing to a lower bias (not shown here but observed 
by us in the numerical evaluation). 

Finally, we consider a setup where Q is a random parameter. 
Since the support set is unimportant in the present case we 
generate the elements 0io, 020 and 0 i 4 o using independent 
Gaussian variables with zero-mean and unit variance, resulting 
in a wider dynamic range than in the previous experiments. 
Nevertheless, the results presented in Fig. [6] show performance 
characteristics similar to the deterministic case presented in 

Fig. D] 


B. Real-valued example: sinusoids in noise 

In contrast to the previous example, we now present a 
case where the regressor columns in (0 are highly correlated. 
Specifically, as a further example with real-valued parameters, 
we consider the identification of a sum of q sinusoids at given 
frequencies {wj} C [0,7r) with unknown phases {</>,} and 
amplitudes { a ,} (most of which are zero). In the following we 
will consider q = 250 possible sinusoids on a uniform grid of 
frequencies. We set two nonzero amplitudes as aio = 1 and 
a 20 = 1 for two slowly-varying sinusoids, narrowly spaced 
with Aw = 0.047T, and <2140 = 3 for a high-frequency sinusoid. 
The phases of the three sinusoids were set to 0. 

We define the signal to noise ratio as 


<j z 

where S is the set of nonzero amplitudes, and parameterize 
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Fig. 1. IID regressors and deterministic 0. NMSE versus n. Left: n = 1 to 10 3 . Right: n = 1 to 10 4 . SNR=20 dB and number of nonzero parameters 
p * = 3. The asterisk denotes the infeasible Ol-Lasso. 




Fig. 2. IID regressors and deterministic 6. Left: variance versus n. Right: square-bias versus n. SNR=20 dB and p * = 3. 




Fig. 5. IID regressors and deterministic 6. NMSE versus n. Left: n = 1 to 10 3 . Right: n = 1 to 10 4 . SNR=20 dB and p* = 50. 







































































































































































Fig. 6. IID regressors and stochastic 6. NMSE versus n. Left: n = 1 to 10 3 . Right: n = 1 to 10 4 . SNR=20 dB and p* = 3. 


the signal as 

9 

yt = ^2 a i sin ((Jit + (j>i) + w t 

i =1 
9 

= ^ Ai cos (ujit) + Bi sin(w,t) + w t 

i=l 

= h Je + w t , 

where the unknown parameter vector is 6 = 

[Ai B\ ■ ■ ■ A q B q ] T £ R p and p = 2q = 500. The regressor 
vector is = [cos(wii) sin(wi t) ■■■ cos (oj q t) sin(w g i)]. 

We set SNR=20 dB. First, Ol-Spice is compared with the 
feasible and infeasible Ol-Lasso which perform substantially 
different from one another but achieve the same rate of NMSE 
decrease. The results are presented in Fig. [7] For n < p, Ol- 
SPICE overtakes the feasible Ol-Lasso at about n = 300. 
Notably, the NMSE of Ol-Spice decreases until it reaches a 
plateau where the estimation errors are very small but where 
the noise level cannot be properly identified. This interesting 
transition characteristic still awaits a satisfactory explanation. 
For n > p, Ol-Spice approaches the infeasible Ol-Lasso as 
time progresses. For reference, we have also added the feasible 
Ol-Rls which illustrates the degradation when not taking the 
parameter sparsity into account. Note, however, that Ol-Rls 
eventually outperforms the Ol-Lasso estimator for which the 
hyperparameter has not been finely tuned to the noise level. 
Next, Fig. [8] illustrates how L affects Ol-Spice. We see that 
the performance characteristics for L = 1, 10 and 100, are 
very similar as was the case with weakly correlated regressor 
columns in Fig. Q] Setting L = 1, however, requires slightly 
more samples to reach the plateau resulting in a gap in NMSE 
compared to L = 10 until about n > 375. 

C. Complex-valued example: synthetic aperture radar imag¬ 
ing 

Finally, we illustrate how Ol-Spice performs in a complex¬ 
valued case, and compare it with Ol-Rls and a novel form 
of Ol-Lasso for this scenario, cf. Appendix B. 



Fig. 7. Sinusoidal parameters 6. NMSE versus time. SNR=20 dB. The 
asterisk denotes the infeasible Ol-Lasso. 



Fig. 8. Sinusoidal parameters 6. NMSE versus time. SNR=20 dB. The 
asterisk denotes the infeasible Ol-Rls. 
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10 15 20 25 30 


Fig. 9. True intensity or reflection coefficient, 0 (p), as a function of p = 
[Px,Py] T ■ 10 ideal point-scatterers are present. 


We consider a setup similar to that of synthetic aperture 
radar imaging where an antenna transmits an electromagnetic 
pulse and the reflected signal carries information about po¬ 
tential scatterers in the scene of interest, cf. (49). Let p be 
a position coordinate in the scene and 0(p) the reflection 
coefficient at p. The observed signal is in the spatial frequency 
domain, where each sample corresponds to a particular angle 
4>. If we grid the space of the scene, the signal at sample t 
can be modeled as 

y(cf> t ) = Y / e- j2npTs{ ‘ l>t) 0(p) + ™t 

p 

= h* t e + w t , 

where 9 £ C p is the vectorized image of reflection coeffi¬ 
cients. For simplicity, we consider g(</>) = <f> £ [0, l ) 2 and 
p £ R 2 such that the observation y(4> t ) corresponds to a 
coefficient of the two-dimensional discrete Fourier transform. 
Here we consider the discretized scene image to be 32 x 32 
such that p = 1024. The true image used in this example 
is shown in Fig. [9] which comprises 10 point scatterers with 
amplitudes equal to 1 . 

The observations at each sample t were taken at a randomly 
chosen angle <p t (corresponding to randomly chosen discrete 
spatial frequencies). The signal to noise ratio was set to 25 dB. 
In Fig. [lO] we compare the estimated images using Ol-Rls, 
Ol-Lasso and OL-SPICE. Note that in this signal setup the 
hyperparameter in the infeasible Ol-Lasso overpenalizes the 
i-[ -norm of 6 which results in no visible scatterers. To produce 
some meaningful plots for Ol-Lasso, the hyperparameter is 
adjusted to A n = 10~ 2 \/nlogp, which illustrates the difficulty 
of selecting it in practical applications. For t close to p = 
1024, three methods estimate the locations and intensities of 
the point scatterers accurately, but OL-SPICE is capable of 
producing accurate images with far fewer samples than the 
other two methods which would require fine-tuning. Indeed, 
the scatterer pattern is already visible at t = 128 samples in 
the Ol-Spice image, without any user input. 


V. Conclusions 

We have derived an online sparse estimator, called Ol- 
SPICE, that obviates the need for tuning hyperparameters. Its 
computational simplicity and adaptability to complex-valued 
parameters render it suitable for large-scale inference problems 
as well as real-time applications, such as system identification 
and synthetic aperture radar imaging. The code for Ol-Spice 
has been made available to facilitate its use in applications. 

Appendix A: Linear minimum mean-square 
estimator and covariance matching 

Here we prove that the minimizer 9 of ( 1 1 01 1 is equivalent 
to using the linear minimum mean-square estimator © with 
covariance parameters set through covariance matching. 

For notational simplicity, let <j> £ Rft 1 denote the co- 
variance parameters, namely the diagonal elements of P and 
cr 2 , and drop subindex n. Further, let X = er 2 I so that 
R (4>) = HPH* + X. Now © can be written as 

9 = PH’R-'y 

= (H*X _1 H + P _ 1 ) _ 1 H*X - 1 y. 

We note that ([9j» is invariant to any scaling of the covariance 
parameters. That is, 

0(c</>) = cPH*(cR) -1 y = 9{<f>) (29) 

for any c > 0, which follows from R(c</>) = H„(cP)H* + 
cX = cR(</>). Finally note that since © minimizes © it is 
therefore the minimizer 9 of the augmented problem (ITOl) as 
well. 

We proceed by inserting © in this will lead to a 

concentrated cost function that is equivalent to using the 
covariance-matching criterion. First, using the matrix inversion 
lemma, note that: 

y - H0 = y - H(P 1 + H*S _ 1 H) _ 1 H*S _1 y 

= X (X ^ 1 - X _ 1 H(P _1 + H*X _ 1 H) _ 1 H*X _1 ) y 
= X(HPH* + X) -1 y 
= ER-V, 

so that 

||y-H0 ||| J _ 1 + ||0|£-i 

= y*R" 1 XX _1 XR V + y*R _ 1 HPP _ 1 PH*R _1 y 
= y*R _1 (X + HPH*)R _1 y 
= y*R _ 1 y. 

Thus after concentrating out 9 , (ITOl) can be written as 

argmin y*R _1 y + tr{R}. (30) 

</> 

Now expand the covariance-matching criterion, 

l|R" 1/2 (yy* - R)\\f = tr{(yy* - R)R _1 (yy* - R)} 

= tr{yy*R _ 1 yy*} + tr{R} - 2tr{yy*} 
= y*R~ 1 y||y|| 2 +tr{R } + k, 
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Fig. 10. Estimated images |0(p)| at various time instants t for a randomly chosen noise realization. The estimates for Ol-Rls and OL-SPICE are shown in 
the first and fourth columns, respectively. For infeasible OL-LASSO with X n = \j2a 1 n log p and a user-adjusted version A n = 10 -2 \/nTogp, the estimates 
are shown in the second and third columns. SNR=25 dB. 


where if is a constant. The covariance matching problem can 
thus be written equivalently as 

argmin y*R“ 1 y + ||y|r 2 tr{R}, (31) 

which is similar to ( l30l ). Let the cost functions in ( |30| ) and ( 1311 
be denoted as </(</>) and respectively. We now show that 

their respective minimizers differ only by a scaling constant. 


That is, 4> = c4>, where c= llyll 1 > 0. This follows from 
cJ(c4>) = c(y*(cR) _1 y) + ctr{(cR)} 

= y*R V + c 2 tr{R} 

= J'{ct>), 

so that for the minimizer 4> we have cJ(c4> ) = J' (4>) < 
J'(4>) = cJ(c(f>). It follows that J{c4> ) < ) for all 

<i> G R# 1 , and therefore the minimizers for the concentrated 
cost function (l30l > and the covariance-matching crierion (OTb 
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differ only by a factor c > 0. From (l29l > we know that the 
linear minimum mean-square estimator is invariant to uniform 
scaling of the covariance parameters. This concludes the proof. 
(See also li48l for other details of this result.) 

Appendix B: Online Lasso for the complex-valued 

case 

An online cyclic LASSO algorithm that covers both the 
real and complex-valued case can be derived using the same 
reparametrization employed in Ol-SPICE. Analogous to dT2l> 
and the derivation of ( fTTI i. the cost function can be written as 


PLACE 

PHOTO 

HERE 
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J{0i) = IIy* - CiQiWl + X n \9i\, 


and in concentrated form, 

= («i + far? - 27 in) + A n r 

' 27 i - X 


= Pi 


rf- 


Pi 


+ CXi 



1 / 27 ^ - A ra 

2 1 Pi 


2 

+ K, 


where K is a constant and the auxiliary variables can be 
computed as (cf. ©): 


Pi = r u 

7i = ICi + rs0i|. 


The minimizing argument 77 > 0 is given by 


77 = max 


/27 i A n 

V Wx 



Thus we have the minimizer 0, = fie^\ where 


(pi = arg(Ci+r"^). 


(32) 


(33) 


(34) 


Note that the above derivation does not involve the variable 
a,; from Section [III] or the variables 77 , and k" in the online 
formulation of SPICE, cf. (l23l i. (l24t . (l25l > and (l26l >. The result 
is summarized in Algorithm [2] 


Algorithm 2 Online LASSO 

1 : Input: y n , h„ and 9 

2 : r := r + h n h* 

3: p ~ p + h n y n 

4: C = p-T9 
5: repeat 

6 : i = 1, . . . ,p 

1: Compute (TT2l> . (l33l) and ( 1341 

8 : §i = 

9: C := C + [r]i(0j - 6i) 

10: 8i := 0i 

11 : until termination 
12 : Output: 9 
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